Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone‡
نویسندگان
چکیده
Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ribosome overhead costs Stochastic Evolutionary Model of Protein Production Rate [ROC SEMPPR]) to extract meaningful information from patterns of CUB within a genome. ROC SEMPPR is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene-by-gene and codon-by-codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here, we demonstrate that it is possible to both extract accurate estimates of codon-specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the Saccharomyces cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon-specific parameters and gene expression levels ([Formula: see text] in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative data sets. For example, our estimates of mutation bias and those from mutational accumulation experiments are highly correlated ([Formula: see text]). Our estimates of codon-specific translational inefficiencies and tRNA copy number-based estimates of ribosome pausing time ([Formula: see text]), and mRNA and ribosome profiling footprint-based estimates of gene expression ([Formula: see text]) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid's "optimal" codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models.
منابع مشابه
Estimating gene expression and codon specific translational efficiencies, mutation biases, and selection coefficients from genomic data
The time and cost of generating a genomic dataset is expected to continue to decline dramatically in the upcoming years. As a result, extracting biologically meaningful information from this continuing flood of data is a major challenge in biology. In response, we present a powerful Bayesian MCMC method based on a nested model of protein synthesis and population genetics. Analyzing the patterns...
متن کاملEstimating selection on synonymous codon usage from noisy experimental data.
A key goal in molecular evolution is to extract mechanistic insights from signatures of selection. A case study is codon usage, where despite many recent advances and hypotheses, two longstanding problems remain: the relative contribution of selection and mutation in determining codon frequencies and the relative contribution of translational speed and accuracy to selection. The relevant target...
متن کاملCodon bias patterns in photosynthetic genes of halophytic grass Aeluropus littoralis
Codon bias refers to the differences in the frequency of occurrence of synonymous codons in coding DNA. Pattern of codon and optimum codon utilization is significantly different between the lives. This difference is due to the long term function of natural selection and evolution process. Genetics drift, mutation and regulation of gene expression are the main reasons for codon bias. In this stu...
متن کاملscnRCA: A Novel Method to Detect Consistent Patterns of Translational Selection in Mutationally-Biased Genomes
Codon usage bias (CUB) results from the complex interplay between translational selection and mutational biases. Current methods for CUB analysis apply heuristics to integrate both components, limiting the depth and scope of CUB analysis as a technique to probe into the evolution and optimization of protein-coding genes. Here we introduce a self-consistent CUB index (scnRCA) that incorporates i...
متن کاملBase Composition and Translational Selection are Insufficient to Explain Codon Usage Bias in Plant Viruses
Viral codon usage bias may be the product of a number of synergistic or antagonistic factors, including genomic nucleotide composition, translational selection, genomic architecture, and mutational or repair biases. Most studies of viral codon bias evaluate only the relative importance of genomic base composition and translational selection, ignoring other possible factors. We analyzed the codo...
متن کامل